AITopics | faster convergence rate

Collaborating Authors

faster convergence rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Fast and Accurate Estimator for Large Scale Linear Model via Data Averaging

Neural Information Processing SystemsFeb-13-2026, 20:20:55 GMT

The asymptotic behavior of the proposed estimation procedure is studied. Our theoretical results show that the proposed method can achieve a faster convergence rate than the optimal convergence rate for sampling methods.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.05)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Towards Sharper Generalization Bounds for Structured Prediction

Neural Information Processing SystemsDec-25-2025, 02:03:05 GMT

In this paper, we investigate the generalization performance of structured prediction learning and obtain state-of-the-art generalization bounds. Our analysis is based on factor graph decomposition of structured prediction algorithms, and we present novel margin guarantees from three different perspectives: Lipschitz continuity, smoothness, and space capacity condition. In the Lipschitz continuity scenario, we improve the square-root dependency on the label set cardinality of existing bounds to a logarithmic dependence. In the smoothness scenario, we provide generalization bounds that are not only a logarithmic dependency on the label set cardinality but a faster convergence rate of order $\mathcal{O}(\frac{1}{n})$ on the sample size $n$. In the space capacity scenario, we obtain bounds that do not depend on the label set cardinality and have faster convergence rates than $\mathcal{O}(\frac{1}{\sqrt{n}})$. In each scenario, applications are provided to suggest that these conditions are easy to be satisfied.

name change, sharper generalization bound, structured prediction, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Towards Sharper Generalization Bounds for Structured Prediction Shaojie Li

Neural Information Processing SystemsAug-18-2025, 03:28:53 GMT

This paper intends to answer the three interesting questions.

artificial intelligence, generalization, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)
(2 more...)

Add feedback

Towards Sharper Generalization Bounds for Structured Prediction

Neural Information Processing SystemsJan-19-2025, 10:27:46 GMT

In this paper, we investigate the generalization performance of structured prediction learning and obtain state-of-the-art generalization bounds. Our analysis is based on factor graph decomposition of structured prediction algorithms, and we present novel margin guarantees from three different perspectives: Lipschitz continuity, smoothness, and space capacity condition. In the Lipschitz continuity scenario, we improve the square-root dependency on the label set cardinality of existing bounds to a logarithmic dependence. In the smoothness scenario, we provide generalization bounds that are not only a logarithmic dependency on the label set cardinality but a faster convergence rate of order \mathcal{O}(\frac{1}{n}) on the sample size n . In the space capacity scenario, we obtain bounds that do not depend on the label set cardinality and have faster convergence rates than \mathcal{O}(\frac{1}{\sqrt{n}}) .

scenario, sharper generalization bound, structured prediction, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)

Add feedback

Reviews: A Stein variational Newton method

Neural Information Processing SystemsOct-8-2024, 10:56:19 GMT

Summary: SVGD iteratively moves a set of particles toward the target by choosing a perturbative direction to maximumly decrease the KL divergence with the target distribution in RKHS. The paper proposes to add second-order information into SVGD updates, preliminary empirical results show that their method converges faster in few cases. The paper is well written, and the proofs seem correct. An important reason in using second-order information is the hope to achieve a faster convergence rate. My major concern is a lack of theoretical analysis of convergence rate in this paper: 1) An appealing property of SVGD is that the optimal decreasing rate equals to Stein discrepancy D_F(q p), where F is a function set that includes all possible velocity fields.

convergence rate, stein discrepancy, stein variational newton method, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)

Add feedback

Reviews: Deep Hyperspherical Learning

Neural Information Processing SystemsOct-7-2024, 21:45:30 GMT

This paper presents a novel architecture, SphereNet, which replaces the traditional dot product with geodesic distance as the convolution operators and fully-connected layers. SphereNet also regularizes the weights for softmax to be norm 1 for angular softmax. The results show that SphereNet can achieve superior performance in terms of accuracy and convergence rate as well as mitigating the vanishing/exploding gradients in deep networks. Novelty: Replacing dot product similarity with angular similarity has widely existed in the deep learning literature. With that being said, most works focus on using angular similarity for Softmax or loss functions. This paper introduces spherical operation for convolution, which is novel in the literature as far as I know.

deep hyperspherical learning, faster convergence rate, similarity, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems

#artificialintelligenceDec-3-2019, 05:13:53 GMT

Decentralized optimization algorithms have attracted intensive interests recently, as it has a balanced communication pattern, especially when solving large-scale machine learning problems. Stochastic Path Integrated Differential Estimator Stochastic First-Order method (SPIDER-SFO) nearly achieves the algorithmic lower bound in certain regimes for nonconvex problems. However, whether we can find a decentralized algorithm which achieves a similar convergence rate to SPIDER-SFO is still unclear. To tackle this problem, we propose a decentralized variant of SPIDER-SFO, called decentralized SPIDER-SFO (D-SPIDER-SFO). We show that D-SPIDER-SFO achieves a similar gradient computation cost--that is, O(ϵ -3) for finding an ϵ-approximate first-order stationary point--to its centralized counterpart. To the best of our knowledge, D-SPIDER-SFO achieves the state-of-the-art performance for solving nonconvex optimization problems on decentralized networks in terms of the computational cost.

d-spider-sfo, decentralized optimization algorithm, faster convergence rate, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.65)

Add feedback

Minimizing Finite Sums with the Stochastic Average Gradient

Schmidt, Mark, Roux, Nicolas Le, Bach, Francis

arXiv.org Machine LearningMay-11-2016

We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1309.2388

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > New York (0.04)
North America > Canada > British Columbia (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Transportation > Air (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback